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e-Machines are minimal, unifilar representations of stationary stochastic processes. They were 
originally defined in the history machine sense — as machines whose states are the equivalence classes 
of infinite histories with the same probability distribution over futures. In analyzing synchronization, 
though, an alternative generator definition was given: unifilar edge-label hidden Markov models 
with probabilistically distinct states. The key difference is that history e-machines are defined by 
a process, whereas generator e-machines define a process. We show here that these two definitions 
^ are equivalent. 

o 

PACS numbers: 02.50.-r 89.70.-|-c 05.45.Tp 02.50.Ey 

Keywords: hidden Markov model, history epsilon-machine, generator epsilon-machine, measure theory, syn- 
QQ chronization, uniqueness 

p/^ I. INTRODUCTION 

pH The e-machine M of a stationary stochastic process V — (Xl) is its minimal, unifilar presentation. Originally, these 

machines were introduced as predictive models in the context of the dynamical systems ^ . Specifically, the machine 

^ states there were defined as equivalence classes of infinite past sequences ^ = ...X-2X-1 that lead to the same 

G predictions over future sequences it = XgXi . . .. This notion has subsequently been expanded upon and formalized 
in Refs. [5H5]. Independently, a similar formulation has also been given in Refs. [5] and [7], wherein the equivalence 
classes, or e-machine states, are referred to as future measures. 

^ In a recent study of synchronization [8l [9] , however, an alternative formulation was given for e-machines as process 

generators. Specifically, e-machines were defined there as irreducible, edge-label hidden Markov models with unifilar 
transitions and probabilistically distinct states. Rather than being defined by a process, a generator e-machine 
*^ defines a process — the process it generates. The generator formulation is often easier to work with than the history 
machine formulation, since it may be difficult to determine the equivalence classes of histories directly from some 
other description of the process — such as the equations of motion of a dynamical system from which the process is 
derived. 

^ I Here, we establish the equivalence of the two formulations in the finite-state case: the original history machine 

. . definition introduced in Refs. [2 |3] and the generator machine definition used in Refs. [HI IS]- It has long been 
^ assumed that the two are equivalent, without formally specifying the generator definition, but our results make this 
explicit and they are developed rigorously. 

Reference [5] also gives a rigorous formulation of the history e-machines. Its results imply equivalence for a more 
general class of machines, not just finite-state. However, the statements given there are in a different language and, 
moreover, it is not initially clear that equivalence is their subject. Furthermore, though its proofs, especially those 
related to Theorem [T] below, are shorter and, perhaps, cleaner than ours, they require more machinery and are not 
as direct. Therefore, we feel that our demonstration of equivalence in the finite-state case with its more elementary 
proofs is useful and provides good intuition. 

To parallel the generator machine definition, when defining history e-machines here we assume that the process is 
not only stationary but also ergodic, and that the process alphabet is finite. Although, neither of these assumptions 
is strictly necessary. Only stationarity is actually needed. The history e-machine definition can easily be extended to 
non-ergodic stationary processes and countable alphabets [31 \S\ . 
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II. BACKGROUND 
A. Processes 

There are several ways to define a stochastic process. Perhaps the most traditional is simply as a sequence of 
random variables (Xl) on a common probability space fl. However, in the following it will be convenient to use 
a slightly different, but equivalent, construction in which a process is itself a probability space whose sample space 
consists of bi-infinite sequences ^ = . . . X-iXqXi — Throughout, we restrict our attention to processes over a finite 
alphabet of symbols X. We denote by X* the set of all words w of finite positive length consisting of symbols in X 
and, for a word w € X* , we write |w| for its length. Note that we deviate slightly from the standard convention here 
and explicitly exclude the null word A from X*. 

Definition 1. Let X be a finite set. A process V over the alphabet X is a probability space {X^,li,¥) where: 

• X^ is the set of all bi-infinite sequences of symbols in X : X^ — = . . . X-iXgXi . . . : xl X, for all i G Z}. 

• X is the a-algebra generated by finite cylinder sets of the form A^^l = € X^ : xl ■ ■ .XL^\yj\_i = uu}; i.e., 
X = a{{A^,L--wGX*,LeZ}). 

• F is a probability measure on the measurable space {X^,X). 

Remark. We implicitly assume here that for each symbol x G X, F{Ax,l) > for some L gN. Otherwise, the symbol 
x is useless and the process can be restricted to the alphabet X /{x}. 

Here, we are primarily concerned with stationary, ergodic processes. We recall the definitions below. 

Definition 2. A process V is stationary if for each w € X* , F{Ayj^L) = F{A^^q) for all L e Z. In this case, we 
simply write P{w) for the probability of the word w with respect to the process: ¥{w) = ¥{Aw,o)- A stationary process 

is defined entirely by the word probabilities P(w), w € X* . 

Definition 3. For a length-L sequence = vqVi . . . vl-i of symbols in X and word w G X* with \w\ < L, let p{'w\v^) 
be the empirical probability of the word w in the sequence : 

/ I r X # occurrences of w in 

p[w\v ) = , — , ^ 

# length-\w\ words in v 

" L-\w\ + l ■ 
A stationary process V is ergodic if, for¥ a.e^ e X"^: 

lim p{w\'i^) = P(w) for each w G X* , (1) 

L— ^oo 

where ~xf^ = x^x\ . . . denotes the length-L future of a bi-infinite sequence ^ — . . . x^iXqXi .... 

Remark. One could consider, instead, limits of empirical word probabilities on finite length pasts "ST^ = X-l ■ ■ ■ X-i 
or finite length past-futures = , but these formulations are all equivalent for stationary processes. 

For a stationary process V and words w,v G X* with P(w) > 0, we define V{w\v) as the probability that the word 
V is followed by the word w in a bi-infinite sequence 

P(«;|i;)=P(A^,o|A^,-|„|) 

= P(^,_l„|n^^,o)/P(^,-|^|) 

= P(vu;)/P(u) . (2) 

The following facts concerning word probabilities and conditional word probabilities for a stationary process come 
immediately from the definitions. They will be used repeatedly throughout our development, without further mention. 
For any words u,v,w G X*: 

2. V{w) > V{wv) and V{w) > V{vw); 
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3. ifPH>o, j:.exn^\^) = h 

4. If ¥{u) > 0, P{v\u) > P{vw\u); and 

5. If P{u) > and P{uv) > 0, P{vw\u) = P(i;|u) • P{w\uv). 

Finally, the history a-algebra H for a process V — {X^,X,P) is defined to be the tr-algebra generated by cylinder sets 
of all finite length histories. That is, 

H = a 1^ Q Hl^ , (3) 

where 

Hl = a ({A^,_|^| : \w\ = L}) . (4) 
HI will be important in the construction of the history e-machine in Sec. |II C| 



B. Generator e-Machines 

Generator e-machines are essentially minimal, unifilar edge-label hidden Markov models. We review this definition 
in more detail below, beginning with general hidden Markov models and then restricting to e-machines. 

Deflnition 4. An edge-label hidden Markov model or (HMM) is a triple (5, A',{T(^)}) where: 

• iS = {(Ti, . . . , djv} is a finite set of states, 

• X is a finite alphabet of symbols, and 

are symbol-labeled transition matrices. T^^^ > represents the probability of transitioning from 
state Gi to state aj on symbol x. 

We also denote the overall statc-to-state transition matrix for an HMM as T: T = X^xeA' ^^"^ overall 
probability of transitioning from state a.i to state Uj, regardless of symbol. The matrix T is stochastic: X^^i ^ 1 
for each i. 

An HMM can also be represented graphically, as a directed graph with labeled edges. The vertices are the states 
tJi , . . . , (Jjv and, for each i,j,x with t'>^'^ > 0, there is a directed edge from state cTj to state aj labeled p\x for the 

symbol x and transition probability p — . The transition probabilities are normalized so that their sum on all 
outgoing edges from each state Uk is 1. 

Example. Figure^depicts an HMM for the Even Process. The support for this process consists of all binary sequences 
in which blocks of uninterrupted Is are even in length, bounded by Os. After each even length is reached, there is a 
probability p of breaking the block of Is by inserting a 0. 




FIG. 1: A hidden Markov model (the e-machine) for the Even Process. The machine has two internal states S = {ai,CT2}, a 
two symbol alphabet X = {0,1}, and a single parameter p £ (0,1) that controls the transition probabilities. The graphical 
representation is presented on the left, with the corresponding transition matrices on the right. In the graphical representation, 
transitions denote the probability p of generating symbol x as p\x. 

The operation of an HMM may be thought of as a weighted random walk on the associated graph. That is, from 
the current state ct^, the next state aj is determined by selecting an outgoing edge from at according to their relative 
probabilities. Having selected a transition, the HMM then moves to the new state and outputs the symbol x labeling 
this edge. 
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The state sequence determined in such a fashion is simply a Markov chain with transition matrix T. However, 
we are interested not simply in the HMM's state sequence, but rather the associated sequence of output symbols it 
generates. We assume that an observer of the HMM may directly observe this sequence of output symbols, but not 
the associated sequence of states. 

Formally, from an initial state Ui the probability that the HMM next outputs symbol x and transitions to state Oj 

is: 

P,.(a;,a,)=T,f) . (5) 

And, the probability of longer sequences is computed inductively. Thus, for an initial state — Ui^ the probability 
that the HMM outputs a word w ~ wq . . . wl-i, wi G X, while following the state path s = at-^ . . . ui^ in the next L 
steps is: 

p..K-) = n7t!:t • (6) 

If the initial state is chosen according to some distribution p = (pi, . . . ,pn) rather than as a fixed state Ui, wc have 
by linearity: 

Pp(a;,o-j) = • Pcr,(a;,a-j) and (7) 

i 

Pp(u;,s) = ^p, •P,,(i(;,s) . (8) 

i 

The overall probabilities of next generating a symbol x or word w — wq . . . wl-i from a given state Ui are computed 
by summing over all possible associated target states or state sequences: 

P-.(a;) = I^P-.l^.'^j") = l|e.7^^"'l|i and (9) 

3 

P,,M= P..(«',s) = l|e.TW||i , (10) 

{s:\s\=L} 

respectively, where = (0, . . . , 1, . . . , 0) is the i'^ standard basis vector in and 

L-1 

1 = 

Finally, the overall probabilities of next generating a symbol x or word w = wq . . . wl-i from an initial state distri- 
bution p are, respectively: 

Pp(x)=;^p, •P„,(a;) = ||pr(^)||i and (12) 

i 

Pp{w)=J2p^-'PM = \\pT^^^h • (13) 

i 

If the graph G associated with a given HMM is strongly connected, then the corresponding Markov chain over states 
is irreducible and the state-to-state transition matrix T has a unique stationary distribution tt satisfying tt — ttT [lOj . 
In this case, we may define a stationary process V — (A'^,X,P) by the word probabilities obtained from choosing the 
initial state according to tt. That is, for any word w G X*: 

P(i(;) =P^(w) = ||7rT('")||i . (14) 

Strong connectivity also implies the process V is ergodic, as it is a "pointwise" function of the irreducible Markov 
chain over edges, which is itself ergodic [10.. That is, at each time step the symbol labeling the edge is a deterministic 
function of the edge. 

We denote the corresponding (stationary, ergodic) process over bi-infinite symbol-state sequences (^,^) by V. 
That is, ■P = {{X X Sf, (X x S),P) where: 
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1. {X X Sf = {{xL,SL)Lez = {{xL)Lez,{sL)Lez) = M) ■ xl & X and sl eS^Le Z}. 

2. (X X §) is the a-algebra generated by finite cylinder sets on the bi-infinitc symbol-state sequences. 



3. The (stationary) probability measure P on (X x §) is defined by Eq. (|8|) with p = tt. Specifically, for any 
length-L word w and length-L state sequence s we have: 

P({(^, ^) : xo . . . XL-i ^w,si...SL = s}) = F^{w, s). 

By stationarity, this measure may be extended uniquely to all finite cylinders and, hence, to all (XxS) measurable 
sets. And, it is consistent with the measure P in that: 

P({(^, ^):xo... XL-i = w}) = Piw) , 

for all w e X*. 

Deflnition 5. A generator e- machine is a HMM with the following properties: 

1. The graph G associated with the HMM is strongly connected. 

2. Unifilarity; For each state ak G S and each symbol x Cz X there is at most one outgoing edge from state <7k 
labeled with symbol x. 

3. Probabilistically distinct states.' For each pair of distinct states ct^, Oj € S there exists some word w € X* such 
that P,Jw) ^P,^(w). 

Since any generator e-machine has a strongly connected graph, we can associate to each generator e-machine AIq 
a unique stationary, ergodic process V = Vmo with word probabilities defined as in Eq. (14 1. We refer to V as the 
process generated by the generator e-machine Mq. 

Example. The Even Process machine of Fig. [7] is also an e-machine: the graph is strongly connected, the transitions 
are unifilar, and the states are probabilistically distinct — state ai can generate the symbol 0, but state a-i cannot. 

Finally, for any unifilar HMM we denote the state-symbol-state transition function as 5. That is, for fc,x with 
Pct;. (a;) > 0, we define b(a\^,x") = ct^, where Uj is the (unique) state to which state (Tj, transitions on symbol x. It 
follows immediately from Eq. ( 10 ) and the definition of matrix multiplication that for any unifilar HMM and any 
word w — . . . Wn-i with Pa-^iwo . ■ . Wn-2) > 0: 

Pa.H= n P-rK") ' (15) 

m=0 

where crj? = dfe and cr™ = (5(cr™~^, w„i-i), for 1 < m < ~ 1- (Here, if \w\ — 1, then wq . ■ ■ Wn^2 — ^, the null word, 
and P^,(A) = 1 for aU k. ) 



C. History e- Machines 



The history e-machine for a stationary process V is, roughly speaking, the hidden Markov model whose states are 
the equivalence classes of infinite past sequences IzT = . . .X-2X-1 with the same probability distributions over future 
sequences = xqXi, .... However, it takes some effort to make this notion precise. The formal definition itself is 
quite lengthy, so for clarity of presentation the verification of many technicalities has been deferred to appendices. 
We recommend first reading through this section in its entirety without reference to the appendices for an overview 
and, then, reading through the appendices separately afterward for the details. The appendices are entirely self 
contained in that, except for the notation introduced here, none of the results derived in the appendices relies on 
the development in this section. As noted before, our definition is restricted to ergodic, finite-alphabet processes to 
parallel the generator definition. Although, neither of these requirements is strictly necessary. Only stationarity is 
actually needed. 

Let V = (A'^,X,P) be a stationary, ergodic process over a finite alphabet X, and let (A'^,X^,P^) be the corre- 
sponding probability space over past sequences 1f. That is: 

• X^ is the set of infinite past sequences of symbols in X: X^ = {1f = . . . x^2X-i : xl X, L = —1, —2, . . .}. 
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• X is the CT-algebra generated by finite cylinder sets on past sequences: X = o'(UlLi^l)' 'where X^ = 
cr {{A~ : \w\ = L}) and A' = = . . . X-2X-1 : . . . x-i = w}. 

• is the probability measure on the measurable space (A"", X~) which is the projection of P to past sequences: 
P-(A-) = P(m;) for each w e X* . 

For a given past IzT G A"^, we denote the most recent L symbols of A past If e A" is 

said to be trivial if P(IF^) = for some finite L and nontrivial otherwise. If a past IzT is nontrivial, then for each 
w (z X* P(u'|1zr^) is well defined for each L (Eq. ^) and one may consider lim^-i-oo lP(w|1zr'^). A nontrivial past 
is said to be w-regular if limi_>oo exists and regular if it is w-regular for each w S X* . Appendix |a| shows 

that the set of trivial pasts T is a null set and that the set of regular pasts TZ has full measure. That is, F[T) = 
and ¥-{n) = 1. 

For a word w E X* the function P(w|-) : 7?. — ?> M is defined by: 

P(u;|^) = lim P(u;|^^) . (16) 

L— >C30 

Intuitively, P(?i;|lF) is the conditional probability of w given "ST. However, this probability is technically not well 
defined in the sense of Eq. ([2|, since the probability of each IF is normally 0. And, we do not wish to interpret 
P(w|1F) in the sense of a formal conditional expectation (at this point in time), because such a definition is only 
unique up to a.e. equivalence, while we are concerned with its value on individual pasts. Nevertheless, intuitively 
speaking, P(w|1F) is the conditional probability of w given IzT, and this intuition should be kept in mind as it will 
provide understanding for what follows. 

The central idea in the construction of the history e-machine is the definition of the following equivalence relation 
~ on the set of regular pasts: 

^ - if P(iy|"^) P(w|^') , for aU we A*. (17) 

That is, two pasts IzT and 1f' are ~ equivalent if their predictions are the same: conditioning on either past leads to 
the same probability distribution over future words of all lengths. 

The set of equivalence classes of regular pasts under the relation ~ is denoted as £ — {Ep,f3 E B}. In general, 
there may be finitely many, countably many, or uncountably many such equivalence classes. Examples are shown in 
Figs. 14, 15, and 17 of Ref. [IX. 

For an equivalence class E £ and word w E X* we define the probability of w given as: 

P{w\Ep) = P{w\^) EEf). (18) 

By construction of the equivalence classes this definition is independent of the representative IzT E Ep, and Appendix 
[B] shows that these probabilities are normalized, so that for each equivalence class Ej^: 



P{x\Ep) = 1 . (19) 



Also, App. |B] shows that the equivalence-class-to-equivalence-class transitions for the relation ^ are well defined. 
That is: 

1. For any regular past IzT and symbol x E X with P(a;|1;r) > 0, the past IFx is also a regular. 

2. If 1zr,1zr' are two regular pasts in the same equivalence class Ep and P{x\Ep) > 0, then the two pasts IzTx and 
^'x must also be in the same equivalence class. 

So, for each Ep E £ and x E X with 'P{x\Ep) > there is a unique equivalence class Ea = S{Ep,x) to which 
equivalence class Ep transitions on symbol x. That is, 5{Ejj,x) is defined by the relation: 

5{Ei3, x) = Ea where IzT a; E Ea for 1c E Ep. (20) 

By Point 2 above, this definition is again independent of the representative 1f E Ep. 

Appendix [c] shows that each equivalence class Ep is an X~ measurable set, so we can meaningfully assign a 
probability: 

F{Ep)^p-{{'^ EEp}) 

= P({^ = : ^ E Ep}) (21) 
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to each equivalence class Ep . We say a process V is finitely characterized if there exists a finite number of positive- 
probability equivalence classes Ei, . . . , En that together comprise a set of full measure: P(-Efc) > for each 1 < k < N 
and X]fc=i IP(-E'fc) ~ 1- For a finitely characterized process P we also sometimes say, by a slight abuse of terminology, 
that f + = {El, . . . , £^Ar} is the set of equivalence classes of pasts and ignore the remaining measure-zero subset of 
equivalence classes. 

Appendix [E] shows that for any finitely characterized process V, the transitions from the positive probability 
equivalence classes Ei e all go to other positive probability equivalence classes. That is, if Ei G 8^ and 'P{x\Ei) > 
0, then: 

S{E,,x)e£+. (22) 

As such, we define symbol-labeled transition matrices T'-^\x € X, between the equivalence classes Ei e £^ . A 
component T^l^'^ of the matrix T^^^ gives the probability that equivalence class Ei transitions to equivalence class Ej 
on symbol x: 

tI;^ = FiE, A E,) = I(x,i,j) ■ J>{x\E,) , (23) 

where I{x,i,j) is the indicator function of the transition from Ei to Ej on symbol x: 

^ ■^ f 1 if Pi^m > and 6{E,,x) = E,, 
\ 7 otherwise. ^ ' 

It follows from Eqs. (19) and (22) that the matrix T = X^ajeA" ^^"^^ stochastic. (See also Claim 17 in App. [e} ) 

Definition 6. LetV — (A'^,X,P) be a finitely characterized, stationary, ergodic, finite- alphabet process. The history 
e-machine Mh{V) is defined as the triple (£^, A", {T*^^)}). 

Note that without referring to the original process V, the state set £+, alphabet X , and transition matrices {T^^^} 
of Def. [6] together define a valid edge-label hidden Markov model, since T is stochastic. This is critical in establishing 
the equivalence of history and generator e-machines, since a history e-machine, when viewed as a hidden Markov 
model, is also a process generator. 



III. EQUIVALENCE 

We will show that the two e-machine definitions — history and generator — are equivalent in the following sense: 

1. li V is the process generated by a generator e-machine Mq, then V is finitely characterized and the history 
e-machine Mh{V) is isomorphic to Mq as a hidden Markov model. 

2. If P is a finitely characterized, stationary, ergodic, finite-alphabet process, then the history e-machine Mh{V), 
when considered as a hidden Markov model, is also a generator e-machine — i.e., it has a strongly connected 
graph, unifilar transitions, and probabilistically distinct states. And, the process V' it generates is the same as 
the original process V from which the history machine was derived. 

That is, there is a 1 — 1 correspondence between finite-state generator e-machines and finite-state history e-machines. 
Every generator is also a history machine (for the same process V it generates), and every history machine is also a 
generator (for the same process V from which it was derived). 



A. Generator e-Machines are History e-Machines 

The purpose of this section is to establish the following: 

Theorem 1. IfV = (X^, X, P) is the process generated by a generator e-machine Mq, then V is finitely characterized 
and the history e-machine Mu{V) is isomorphic to Mq as a hidden Markov model. 

The key ideas in proving this theorem come from the study of synchronization to finite-state generator e-machines 
[SI in] ■ In order to state these ideas precisely, however, we first need to introduce some terminology. 

Let Mq be a generator e-machine, and let V = (A'^,X,P) and V = {{X x 5)^, (X x §),P) be the associated 



symbol and symbol-state processes generated by Mq as in Sec. II B above. Furthermore, let the random variables 



8 



Xl ■ {X X S)^ X and Sl '■ {X x S)^ S he the natural projections Xl{ x , s ) = xl and Sl{ x , s ) = sl, and 

let =Xo...Xl-i and =X_l...X_i. 

The process language C{'P) is the set of words w of positive probability: C{V) — {w G X* : V{w) > 0}. For a given 
word w e ^{V), we define = P(5|w) to be an observer's belief distribution as to the machine's current state 

after observing the word w. Specifically, for a length-i word w G 'C(P), (f>{w) is a probability distribution over the 
machine states {cti, . . . , ctat} whose kth component is: 

= P(^o = fife, = w;)/P(^^ = w) . (25) 

For a word w ^ -^(7^) we will, by convention, take 4'{w) = n. 

For any word w, a{w) is defined to be the most likely machine state at the current time given that the word w 
was just observed. That is, a{w) = ak*, where k* is defined by the relation 4>{w)k' = max^ (j){w)k- In the case of a 
tie, k* is taken to be the lowest value of the index k maximizing the quantity (j){w)k- Also, P{w) is defined to be the 
probability of the most likely state after observing w and Q{w) is defined to be the combined probability of all other 
states after observing w. 

Piw) = (j){w)k^ (26) 

and 

Q{w) = Hw)k = 1 - PH . (27) 

k^k* 

So, for example, if ^(w) = (0.2,0.7,0.1) then a{w) = (72, Piw) = 0.7, and Q{w) = 0.3. 

The most recent L symbols are described by the block random variable and so we define the corresponding 
random variables $l = (f>(^^), ~Sl = '^(^^), Pl = P(^^), and Ql = q(x^)- Although the values depend 
only on the symbol sequence formally we think of Sl, Pl, and Ql as defined on the cross product space 
{X X SY'. Their realizations are denoted with lowercase letters (f)L, sl, Pl, and ql, so that for a given realization 
(^, ^) e {X X S)^, 4>L — 0(^^), Sl = o'('t^), PL = P{^^), and qL = Qi^^). The primary result we use is the 
following exponential decay bound on the quantity Q^. 

Lemma 1. For any generator e-machine Mq there exist constants K > Q and < a < 1 such that: 

F{Ql > a^) < Ka^, for all L en . (28) 

Proof. This follows directly from the Exact Machine Synchronization Theorem of Ref. [81 and the Nonexact Machine 
Synchronization Theorem of Ref. [1] by stationarity. (Note that the notation used in those papers differs slightly 
from that here, by a time shift of length L. That is, Ql in those papers refers to the observer's doubt in Sl given 
instead of the observer's doubt in Sq given ^^). □ 

Essentially this lemma says that after observing a block of L symbols it is exponentially unlikely that an observer's 
"doubt" Ql in the machine state will be more than exponentially small. Using this lemma we now prove Thm. [ij 

Proof. (Theorem [iJ Let Mq be a generating e-machine with state set S — {cti, . . . ,criv} and stationary distribution 
TT = (tti, . . . , tttv). Let V and P be the associated symbol and symbol-state processes generated by Mq. By Lemma 
[l] there exist constants K > and < a < 1 such that P(Ql > Q;^) < Ka^, for all L eN. Let us define sets: 

Vl ^ {{^ : QL < a"- ,so^sl} , 

V[^{i^,^):qL<a'^ ,So^sl} , 

Wl^H^M) - ql >a^} , and 
Ul^WlUV[. 

Then, we have: 

V{UL)=nv[)+V{WL) 
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So: 



Hence, by the Borel-Cantelli Lemma, V{Ul occurs infinitely often) = 0. Or, equivalently, for P a.e. (^,^) there 
exists Lq e N such that (^,'^) e Vl for all L > Lq. Now define: 



C = {(V, V) : there exists Lq e N such that ( V , V ) G Vl for aU i > Lq} , 
-Dfc = {(^, ^) : So = CTfc} , and 

Ck^cnDk. 

According to the above discussion P(C) = 1 and, clearly, ¥{0^) = TTfc. Thus, P(Cfe) = P(C n D^) = TTfc. Also, by the 
convention for (l){w), w ^ ^i'P), we know that for every ^) G Cfc, the corresponding symbol past 1F is nontrivial. 
So, the conditional probabilities F{w\^^) are well defined for each L. 

Now, given any (^,^) G Cfe take Lq sufficiently large so that for all L > Lq, {^,^) G Vl- Then, for L > Lq, 
sl — Cfc and ql < ■ So, for any word w € X* and any L > Lq, we have: 



(*) 



-P.. HI 



= 2gL 

Step (*) follows from the fact that ^" and ^" are conditionally independent given 5*0 for any m, n G N by construction 
of the measure P. Since |P(w|^^) — {w)\ < 2a^ for all L > Lq, we know limL_j.oo 
Since this holds for all w G X*, we know x is regular and F{w\''ic) = Fc^^{w) for all w G X* . 
Now, let us define equivalence classes Ek, k — 1, . . . , N , hy: 



(w|1F"^) = PCTfc(w) exists. 



is regular and P(u'|1F) = P(j^(w) for all w G A"*} 



And, also, for each k — 1, . . . , N let: 

^fe-{(^,^):^GSfe}. 

By results from App. [c]we know that each equivalence class Ek is measurable, so each set Ek is also measurable with 
F{Ek) = V{Ek). And, for each k, Ck C Ek, so F{Ek) = V{Ek) > P(Cfe) = TTfc. Since J2k=i = 1 and the equivalence 
classes Ek, k ~ 1, N, are all disjoint, it follows that P(-Efe) — nk for each k and J2k=i P(^fe) ~ J2k=i = 1- Hence, 
the process V is finitely characterized with equivalences classes f = {Ei, . . . , E^}. 

Moreover, the equivalence classes {Ei, . . . , E^} — the history e-machine states — have a natural one-to-one corre- 
spondence with the states of the generating e-machine: E^ ^ ak,k = 1, . . . , N . It remains only to verify that this 
bijection is also edge preserving and, thus, an isomorphism. That is, we must show that: 



10 



1. For each k = 1, . . . , N and x € X , Pix\Ek) = Pajx). 

2. For all k,x with P{x\Ek) = Pakix) > 0, S{Ek,x) = S{ak,x). That is, if S{Ek,x) — Ej and S{(7k,x) = dj', then 



Here, S{Ek,x) is the equivalence-class-to-equivalence-class transition function 5 as defined in Eq. (20) and 6{ak,x) is 
the unifilar HMM transition function 5 as defined in Sec. IIIBI 

Point 1 follows directly from the definition of Ek- To show Point 2, take any k, x with P{x\Ek) = Pa^{x) > and 
let S{Ek, x) = Ej and S{ak, x) — aji . Then, for any word w € X* , we have: 

(i) P{xw\Ek) — Pa-fc(a;w), by definition of the equivalence class Ek, 



(h) P{xw\Ek) = P{x\Ek) ■ P{w\Ej), by Claim 11 in App. and 



(iii) P„^{xw) = Pak{x) ■ Pa;{w), by Eq. (10) applied to a unifilar HMM. 



Since P{x\Ek) — Po-^(a;) > 0, it follows that P{w\Ej) — Pa.,{w). Since this holds for all w € X* and the states of 
the generator are probabilistically distinct, by assumption, it follows that j = j' . 

□ 

Corollary 1. Generator e-machines are unique: Two generator e-machines Mq^ and that generate the same 

process V are isomorphic. 

Proof. By Thm. [ijthe two generator e-machines are both isomorphic to the process's history e-machine Mh{V) and, 
hence, isomorphic to each other. □ 

Remark. Unlike history e-machines that are unique by construction, generator e-machines are not by definition 
unique. And, it is not a priori clear that they must be. Indeed, general HMMs are not unique. There are infinitely 
many nonisomorphic HMMs for any given process V generated by some HMM. Moreover, if either the unifilarity or 
probabilistically-distinct-states condition is removed from the definition of generator e-machines, then uniqueness no 
longer holds. It is only when both of these properties are required together that one obtains uniqueness. 

B. History e-Machines are Generator e-Machines 

The purpose of this section is to establish the following: 

Theorem 2. IfV is a finitely characterized, stationary, ergodic, finite- alphabet process, then the history e-machine 
MniV), when considered as a hidden Markov model, is also a generator e-machine and the process V' it generates is 
the same as the original process V from which the history machine was derived. 

Note that by Claim [TT] in App. [Ejwe know that for any finitely characterized, stationary, ergodic, finite-alphabet 
process the history e-machine Mh(V) = {E^ , X , {T'^^^}) is a valid hidden Markov model. So, we need only show 
that this HMM has the three properties of a generator e-machine — strongly connected graph, unifilar transitions, and 
probabilistically distinct states — and that the process V' generated by this HMM is the same asV. To do so requires 
several lemmas. Throughout = (/ii, . . . , ^jv) = {^{Ei), . . . , P(i?jv))- 

Lemma 2. The distribution /i over equivalence- class states is stationary for the transition matrix T = X^^eA' -^'^'^ ■ 
That is, for any 1 < j < N , = YliLi l^i ' • 

Proof. This follows directly from Claim 15 in App. [Ejand the definition of the T*^^^ matrices. □ 



Lemma 3. The graph G associated with the HMM A", {T'^^}) consists entirely of disjoint strongly connected 
components. Each connected component of G is strongly connected. 

Proof. It is equivalent to show that the graphical representation of the associated Markov chain with state set f + 
and transition matrix T consists entirely of disjoint strongly connected components. But this follows directly from 
existence of a stationary distribution ^ with jik = P(i?fe) > for all k [10]. □ 

Lemma 4. For any Ek & £^ and w g X* , P{w\Ek) — P£;^(?i'), where Pskiw) — ||efc T'('") || -| is the probability of 



generating the word w starting in state Ek of the HMM (£+, X, {T*^^)}) as defined in Sec. II B 
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Proof. As noted in Sec. II C (Eq. (20)) for each equivalence class Ek G and symbol x with P{x\Ek) > 0, there is a 
unique equivalence class 5{Ek, x) to which equivalence class E^ transitions on symbol x, and by Claim 16 in App. Ej 
5{Eu,x) e £+. It follows immediately from the construction of the HMM +, A", {T^^)}) that this HMM is unifilar 
and its transition function S (as defined in Sec. IIB) is the same S. Also, by construction we have P{x\Ek) — P_b^(x) 
for all a; € A", so the statement holds for words of length \w\ — 1. 

To extend to words of length \w\ > 1, let us write w — vxu, where v is the longest proper prefix of w such that 
F{v\Ek) > 0, a; is the next symbol in w after v, and u is the remainder of the word w. u ot v may be the null word 
A. By definition, we take P{X\Ei~) = 1 for all Ek, as in Claim 12 of App. |d] so such a v always exists. 

be the m}^ symbol of the word w, starting with wq. Since 'P{v\Ek) > 0, we know by 



Now, let n — \vx\, and let w„ 



Claim 
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that the states EJT-, < m < n - 1, defined by E^ = Ek and E^ = S{E^ 



_i) for 1 < m < n — 1 are 



well defined. Thus, there is an allowed state path in the graph of the HMM X, {T*^^'}) from state Ek following 
edges labeled with the symbols in the word v, so P_b^(w) > 0. 

Therefore, since Pe^{v) and P{v\Ek) are both nonzero and the HMM {£+,X, {T^^)}) is unifilar, we have by Claim 
121 and Eq. 



(15 



n-l 



P{vx\Ek) = n P(^m|£;r) = n PeJpM = Pb.M , 



where implicitly in the second equality we use the fact that both sets of -B™s are the same, since the transition function 
6 between equivalence classes is the same as the transition function S between states of the HMM. 

This proves the statement directly for the case u = A. If u 7^ A, then P{vx\Ek) — 0, so ||efer("^)||i ~ P e^^^x) — 
and ekT^""^^ is the zero vector. And, hence: 



{w)^PE^{yxu)^ efcT^™") (efcT^"^)) r(") 



= 



Also, if u 7^ A, then P(wo • • • w\^\2 |£;fe)=0. So, by Claimnqin App. P(w|^;fe) = P(wo...W|^|_i|Sfc) = 0. Thus, 
foru^A, Pb,H =P(u;|Sfe) =0. 

□ 

Lemma 5. For any w E X* , V(w) = H^iT^'") |1 1 . 

Proof. Let Ek,n, = : ^'""l — w,^ £ Ek}- Claim[l4]of App. [d| shows that each Ek,w is an X-measurable set with 



V{Ek,nj) = P(Sfc) • P{w\Ek). Since J2k=i^iEk) = 1, it follows that V{w) = T,k=i^iEk,w) for each w G X* . Thus, 
applying Lemma |4j for any w G X* we have: 

N 



N 



fc=i 

N 

= Y,nEk)-p{w\Ek) 



fc=l 

N 

^Mfel|efe-f - '111 
fc=i 

IImt(-)||i 



□ 



Proof (Theorem [2| 

1. Unifilarity: As mentioned in the proof of Lemma[4j this is immediate from the history e-machine construction. 

2. Probabilistically Distinct States: Take any k,j with k ^ j. By construction of the equivalence classes there 
exists some word w G X* such that P{w\Ek) ^ P{w\Ej). But by Lemma|4) P{w\Ek) = Pe,{w) and P{w\Ej) = 
Ps^iw). Hence, PbJw) ^ Pej{w), so the states Ek and Ej of the HMM (£+, A", {T*^)}) are probabilistically 
distinct. Since this holds for all k 7^ j, the HMM has probabilistically distinct states. 

3. Strongly Connected Graph: By Lemma [3] we know the graph consists of one or more connected components 
Ci, . . . , C„, each of which is strongly connected. Assume that there is more than one of these strongly connected 
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components: n > 2. By Points 1 and 2 above we know that each component Cj defines a generator e-machine. 
If two of these components — say, Ci and Cj — were isomorphic via a function / : Ci states — > Cj states, then for 
states Ek £ Ci and Ei £ C-j with f{Ek) — Ei, we would have PE^iw) — P^;, (w) for all w e X*. By Lemma |4j 
however, this implies P{w\Ek) = P{w\Ei) for all w £ X* a,s well, which contradicts the fact that Ek and Ei are 
distinct equivalence classes. Hence, no two of the components Ci,i = 1, . . . , n, can be isomorphic. By Cor. [T] this 
implies that the stationary processes ,i — I, . . . ,n, generated by each of the generator e-machine components 
are all distinct. But, by a block diagonalization argument, it follows from Lemma [s] that V = X]"=i ' "P^i 
where /i^ = J2{k-EkGC } l^k- That is, for any word w £ X* , we have: 



i=l 
n 



i=l 

where and T*''"") are, respectively, the stationary state distribution and lu-transition matrix for the generator 
e-machine of component Ci. Since the V^s are all distinct, this implies that the process V cannot be ergodic, 
which is a contradiction. Hence, there can only be one strongly connected component Ci — the whole graph is 
strongly connected. 

Equivalence of V and V' : Since the graph of the HMM X , {T'^^}) is strongly connected there is a unique 
stationary distribution tt over the states satisfying tt — t:T. But we already know the distribution fi is stationary. 
Hence, tt = /i. By definition, the word probabilities ¥'{w) for the process V generated by this HMM are 
P'(w) = llTrTWlli,^ e X*. But, by Lemma [H] we have also P(w) = \\^iT<^'"^\\i = ||7rr('")||i for each w £ X* . 
Hence, P(u>) = ¥'{w) for all w £ X* , soV and T" are the same process. 

□ 



IV. CONCLUSION 



We demonstrated the equivalence of finite-state history and generator e-machines. This idea is not new in the 
development of e-machincs, but a rigorous analysis of their equivalence was absent until quite recently. While the 
results of Ref. also imply equivalence, we feel that the proofs given here, especially for Thm. [ij are more direct 
and provide novel, constructive intuitions. 

For example, the key step in proving the equivalence — or, at least, Thm. [l|s new approach — came directly from 
recent bounds on synchronization rates for finite-state generator e-machines. In addition, as we look forward to 
generalizing equivalence to larger classes, such as machines with a countably infinite number of states, it seems 
reasonable that one should attempt to deduce and apply similar synchronization results for countable-state generators. 
However, synchronization is more subtle for countable-state generators and, more to the point, exponential decay rates 
as in Lemma [l] no longer always hold. Thus, equivalence in the countable-state case is challenging. Fortunately, Ref. 
[5] indicates that it holds for countable-state machines if the entropy in the stationary distribution -//[vr] is finite, 
which it often is. 
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Appendix A: Regular Pasts and Trivial Pasts 



We establish that the set of trivial pasts T is a null set and that the set of regular pasts TZ has full measure. 
Throughout this section V = (A'^,X,P) is a stationary, ergodic process over a finite alphabet X, and (A'^,X^,P^) is 
the corresponding probability space over past sequences IzT. Other notation is used as in Sec. [Hi 

Claim 1. P^ a.e. 1F is nontrivial. That is, T is an measurable set with P^(T) — 0. 
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Proof. For any fixed L, Tl = : ) = 0} is X measurable, since it is X^ measurable, and P (71) = 0. Hence, 

T = UiLi ^1^° measurable with P~(T) =0. □ 

Claim 2. For any w G X* , P^ a.e. Ir is w-regular. That is, 

7^„ EE : P(^^) > 0, for all L and lim P(w|^^) exists} 

L— f oo 

is an X~ measurable set with ¥^ (TZ^) ~ 1. 

Proof. Fix w(^X*. Let F^^l : A"" ^ M be defined by: 

™ \ otherwise. 

Then, the sequence (l^uj.i) is a martingale with respect to the filtration (X£) and E(y^_i) < 1 for all L. Hence, 
by the Martingale Converge Theorem K^, ^ '—^ for some X~ measurable random variable Y^. In particular, 
limL^oo i^M),L(^) exists for P^ a.e. Ie. 

Let TZw = : limi_i.oo 5^to,L(^) exists}. Then, as just shown, TZ^ is X~ measurable with P~ = 1 and, 
from Claim [l] we know T is X^ measurable with P~(T) = 0. Hence, TZ^^ = TZw H T'^ is also X~ measurable with 

p-(7e^) = 1. □ 

Claim 3. P^ a.e. IzT is regular. That is, TZ is an X^ measurable set with V^{TZ) = 1. 

Proof. TZ — PltueAT* '^w- By Claim [2j each TZ^ is X~ measurable with {TZ^) = 1. Since there are only countably 
many finite-length words w £ X* , it follows that TZ is also P~ measurable and P^(7?.) = 1. □ 



Appendix B: Well Definedness of Equivalence Class Transitions 



We establish that the equivalence-class-to-equivalence-class transitions are well defined and normalized for the 
equivalence classes Ep £ £. Throughout this section T' — (A'^.X,P) is a stationary, ergodic process over a finite 
alphabet X and (A'~,X^,P^) is the corresponding probability space over past sequences IzT. Other notation is used 

as in Sec. Recall that, by definition, for any regular past IjT, P(1r'^) > 0, for each L G N. This fact is used 
implicitly in the proofs of the following claims several times to ensure that various quantities involving Ic"^ are well 
defined. 

Claim 4. For any regular past Ir" G X~ and word w £ X* with P(ii;|1ir) > 0; 
(i) P(^^w) > for each L £ N and 
(it) P(w|^^) > for each L£N. 

Proof. Fix any regular past 1f £ X" and word w £ X* with P(w|1F) > 0. Assume there exists L £ N such that 
¥{^^w) = 0. Then P(^'w) = for all I > L and, thus, P(w|^') P(^'w)/P("^') ^ for aU / > L as weU. Taking 
the limit gives F{w\^) = limi^oo V{w\^'') — 0, which is a contradiction. Hence, we must have P(If^u') > for each 
L, proving (i). (ii) follows since V{w\^^) = P(1F"^w)/P(^^) is greater than zero as long as P(lzr^w) > 0. □ 

Claim 5. For any regular past liT £ X^ and any symbol x £ X with P(a;|lF) > 0, the past IzT a; is regular. 

Proof. Fix any regular past 1f £ X^ and symbol x £ X with P(a;|lF) > 0. By Claim |4j P(1zr^a:) and V{x\^^) are 
both nonzero for each L £ N. Thus, the past IFx is nontrivial and the conditional probability P(w|lF^a;) is well 
defined for each w £ X* ,L £N and given by: 
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Moreover, since P(a;| a; ) > the quantity P(a;w| x )/P(x| a; ) is well defined for each w € X* and we have: 

lim V{w\C^x)^) = lim V{w\^^x) 



P(a;ii;|^ ) 
= hm —J— 

P(a;|^ ) 
{*) liniL^oo Fixw\^^) 
linu^ooP(x|^^) 
P(xw|1f) 
" P(x|V) ■ 

Step (*) is permissible because the limits in the numerator and the denominator are both known to exist separately 

(since IzT is regular), and the limit in the denominator, Ihni^^rx, P(x|1e' ) = P(a;|^), is nonzero by assumption. From 
the last line we see that limL_i.oo P(w|(lFa;)^) — P(a;u'|1zr)/P(a;|1zr) exists. Since this holds for all w E X* , the past 
^x is regular. □ 

Claim 6. // IF and 1f' are two regular pasts in the same equivalence class Ep G £ then, for any symbol x G X with 
F{x\Ep) > 0, the regular pasts IFx and If'x are also in the same equivalence class. 

Proof. Let Ep E S and fix any IzT, Ir' e Ep and x € X with 'P{x\Ep) = P(x|1F) = P(x|1zr') > 0. By ClaimjsjlFa; and 
' X are both regular. And, just as in the proof of Claim [5] for any w G X* we have: 



If 



1^ N ^, P(a;w|lF) 'P{xw\Er 



L^oo ' ' ' P(a;|V) 'P{x\Ep) 

Also, similarly, for any w G X*: 



V{w\Vx) = lim P(u;|(^'a;)^) 



V{xw\V) _ I'(xw\Efj) 



i^oo P(x|^') ^{x\Ef3) 

Since this holds for all w £ X* , it follows that IFx and IzT'a: are both in the same equivalence class. □ 
Claim 7. For any equivalence class Ep, X^xgat P(^l^/3) ~ ^■ 
Proof Fix ^ e Ep. Then: 

P{x\Ep) = J2 P(^l^) 



y hm 



lim VP(a;|^^) 



= hm 1 

L—^oo 

= 1. 



□ 



Appendix C: Measurability of Equivalence Classes 

We establish that the equivalence classes Ep^fi e B, are measurable sets. Throughout this section V = (A'^,X,P) 
is a stationary, ergodic process over a finite alphabet X and (A'^,X^,P^) is the corresponding probability space over 
past sequences IzT. Other notation is used as in Sec. HI 

Claim 8. Let Aw.p = {"5" : P(IF^) > for all L and limL_>.oo V(w\^^) = p}. Then Au!,p is measurable for each 
we A-*, pe [0,1]. 
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Proof. We proceed in steps through a series of intermediate sets. 

• Let ^+ = : P(^^) > 0, P{w\t^) <p + e} and A'^^^^^^ = : P(^^) > 0, P{w\^^) >p-e}. 

p e L ^-iid A'^ p g ^ are both measurable, since they are both measurable. 

• Let, A+^p^, = \Jn=ir\T=n-^w,p,e.L = H"^^) > for all L, there is an n e N such that P(w|^^) < 
p + eforL > n}. And, ^-^^ = [jn=ir\T=n^Z,p,e,L = & ■ > for all L, there is an n e 
N such that P(w;|1F^) > p — e for L > n}. Then .4+ p and A~ p^^ are each X^ measurable since they are 
countable unions of countable intersections of X~ measurable sets. 

• Let Ani,p,e = -4i.p,eny4,^ = 1^ : P(^'^) > for ah L, there is an ti e N such that 
Then Aw,p,e is X~ measurable since it is the intersection of two X~ measurable sets. 

• Finally, note that Aw,p = f]m=i^'w,p,emJ where = 1/m. And, hence, Aw,p is X~ measurable as it is a 
countable intersection of X~ measurable sets. 

□ 

Claim 9. Any equivalence class Ep ^ £ is an X^ measurable set. 

Proof. Fix any equivalence class Ep G f , and for w G X* let Pw — P(w|i?/3). By definition Ep — HtueAr* -^i«,p™ ^^d, 
by Claim [s] each Aw,p„ is X~ is measurable. Thus, since there are only countably many finite-length words w £ A"*, 
Ep must also be X~ measurable. □ 



< e for L > 



Appendix D: Probabilistic Consistency of Equivalence Class Transitions 

We establish that the probability of word generation from each equivalence class is consistent in the sense of Claims 
[12] and [M] Claim [14] is used in the proof of Claim [15] in App. [E] and Claim [12] is used in the proof of Thm. [2] 
Throughout this section we assume V = (<Y^,X,P) is a stationary, ergodic process over a finite alphabet X and 
denote the corresponding probability space over past sequences as (A'^,X^,P^). Other notation is as in Sec. In] 

Claim 10. For any Ep e £ and w,v e X* , P^wvlEp) < P{w\Ep). 
Proof Fix IzT e Ep. Since P(wu|1F'^) < P(u;|1f^) for each L: 



P{wv\Ep) = P{wv\Y) = lim P(ww| V ) < lim P(w|V ) = P(w| V) = P{w\Ep). 

L-^oo L-^oo 



□ 



Claim 11. Let Ep e £, x e X with Pix\Ep) > 0, and E^ = S{Ep,x). Then, P{xw\Ep) = Pix\Ep) ■ P(u;|£;„) for 
any word w G X* . 

Proof. Fix ^ Ep. Then iFa: <E Ea is regular, so V{^^x) > for all L and we have: 

P{xw\Ep) = P(a;u;|^) 

= lim P(a;u'|lF"'') 

= lim P(2;|^^) - Viwl^^x) 

L—>^oo 

= lim P(2;|^^) • lim ¥{w\^^x) 

L— >-oo L— )-oo 

= P(a;|^) • P{w\^x) 
^P{x\Ep)-Piw\E^) . 



□ 
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Claim 12. Let w = wq . . . Wn-i S X* be a word of length n > 1, and let w"^ = wq . . . Wm-i for < m < n. Assume 
that P{w"'~^\Ep) > for some G £. Then, the equivalence classes EJ^ , < m < n — 1, defined by the relations 

E'^ — Ep and E"™ = S{EJ^^^ ,Wm-i) for 1 < m < n — 1, are well defined. That is, P{wm-i\E^^^) > for each 

1 < m < n - 1. And, Piw\Ep) = UZJo PK„|i;^"). 

Here, — \ is the null word and, for any equivalence class Ep, P(A|i?^) = 1. 



Proof. For \w\ = 1 the statement is immediate, and for \w\ = 2 it reduces to Claim 11 For |w| > 3, it can proved 



by induction on the length of w using Claim 11 and the consistency bound provided by Claim 10 which guarantees 
V{wq\Ep) > if V{w'^-^\Ep) > 0. □ 



The following theorem from [121 Chapter 4, Theorem 5.7] is needed in the proof of Claim 13 It is an application 
of the Martingale Convergence Theorem. 

Theorem 3. Let (f2, J-", P) be a probability space, and let Q ^3 ■ ■ ■ be an increasing sequence of a -algebras on 

O with J-"oo = cr(|J„^]^ C T. Suppose X : VL is an T -measurable random variable (with E|X| < oo). Then, 

for (any versions of) the conditional expectations K{X\J-'n) and E(X|J^x))j we have: 

E{X\Tn) E{X\T^) a.s. and in L^ . 

Claim 13. For any w e X*, P™(^) is (a version of) the conditional expectation E^ly^^jjjEll) ("^), where P^, : 
X^ [0, 1] is defined by: 

_ J P(w|1F) i/ If is regular (where — ic a? ), 



P»(V) = 



otherwise. 



Proof. Fix w G X* , and let E^, be any fixed version of the conditional expectation E (l^^ oP)- Since the function 
P^,i : X^ ^ [0, 1] defined by: 

\ otherwise , 

is a version of the conditional expectation E(l^^ ^ |IHI/^), Thm. [3] implies that F^^i^C^) E^(^) for P a.e. 
Now, define: 

K, = : P„,l(^) ^ E.„(^)} and 
TZ = : IzT is regular} . 

By Clainl[|p-(7^) = 1, so we know P(^) = 1. And, by the above, P(K,) 1. Hence, P(Wu,) = 1, where: 

= {"^ : ^ is regular, P(w I ^) = E^(^)} . 



For each ^ G W^^, however, we have: 

P^(^) =P(u;|^) =E^(^) . 

Thus, P^(^) = E^(^) for P a.e. so for any H measurable set H, P^ dF — dP. Furthermore, P^ is H 

measurable since Pu,,l Pw and each Fu,,l is H measurable. It follows that F^C^) is a version of the conditional 
expectation E (1a„,o'|EI)- ' ^ 

Claim 14. For any equivalence class Ep G £ and word w G X* , the set Ep^yj = {"^ : "fe" G iJ^,~af'™' = w} is X 
measurable with P(£',3,^) = P{Ef3) ■ F{w\Ep). 
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Proof. Let Ef} = : ^ € E^}. Then E^ and A^^q are both X measurable, so their intersection Ep,^ is as weU. 
And, we have: 



(a) 



E(1a„,o|H)(^) dP 



(6) 



'P{w\Ep) dP 



= nEp)-'P{w\Ep) , 

where (a) follows from the fact that iJ^ is H measurable and (b) follows from Claim 
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□ 



Appendix E: Finitely Characterized Processes 



We establish several results concerning finitely characterized processes. In particular, we show (Claim 171 that the 
HMM associated with the history e-machine MniV) is well defined. Throughout, we assume V = {Pi\K,V) is a 
stationary, ergodic, finitely characterized process over a finite alphabet X and denote the corresponding probability 
space over past sequences as (A"", X^, P^). The set of positive probability equivalences is denoted — {Ei, . . . , E^} 
and the set of all equivalence classes as f = {Ei3,/3 S B}. For equivalence classes Ep,Ea € £ and symbol x ^ X, 
I{x,a,P) is the indicator of the transition from class Ea to class Ep on symbol x. 



I{x,a,(3) 



_ J 1 if P{x\Ea) > and d{Eo„x) = Efs, 
otherwise. 



Finally, the symbol-labeled transition matrices T'^^\x e X, between equivalence classes Ei, . . . , En are defined by 



T^j = P{x\Ei) ■ I{x,i,j). The overall transition matrix T between these equivalence classes is T = Sa;eAr 
Claim 15. For any equivalence class Ejj € £: 



N 



Proof. We have: 



k=i xex 



V : IF xo e E/s}) 



N 



k = l 
N 

= E E '^({^ ■ &Ep,'^e Ek,Xo = x}) 
k=i xex 

N 

k=i xex 

N 

= ^^P(i?fe,a;)./(x,A:,/3) 

k=i xex 

N 

5]^P(£;,)-P(a;|i?fc)-/(a;,fc,/3) , 

k=i xex 

N 



where (a) follows from stationarity, (b) from the fact that J2k=i ^(Ek) — 1, and (c) from Claim 
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Claim 16. For any Ek £ and symbol x with V{x\Ek) > 0, 5{Ek,x) € £+. 



Proof. Fix Ek G £+ and x e X with P{x\Ek) > 0. By Claim 15 V{S{Ek,x)) > F{Ek) ■ F{x\Ek) > 0. Hence, 



S{Ek,x)e£+. □ 

Claim 17. The transition matrix T — stochastic: Ty = 1 for each 1 < i < N . Hence, the HMM 
(£+,A',{r(^)}) - MniV) is well defined. 

Proof. This follows directly from Claims [7] and [16] □ 
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